Client Report - Project 0: Introduction

Course DS 250

Author

[Maia Faith Chambers]

Show the code
import pandas as pd
import numpy as np
from lets_plot import *


LetsPlot.setup_html(isolated_frame=True)
Show the code
from palmerpenguins import load_penguins
df = load_penguins()

ggplot(df, aes(x="species")) + geom_bar()
df.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

Include the tables created from PY4DS: CH2 Data Visualization used to create the above chart

__ PY4DS: CH2 Data Visualization

Show the code
# Include and execute your code here
penguins = load_penguins()
penguins
penguins.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

The tabular data allows us to easily view each variable in a tidy and simple manner. These variables are: species, flipper_length_mm, and body_mass_g.

Recreate the example charts from PY4DS: CH2 Data Visualization of the textbook.

I’d like to answer for pracite: ## Do penguins with longer flippers weigh more or less than penguins with shorter flippers? Recreate the example charts from PY4DS: CH2 Data Visualization of the textbook. (Hint: copy the chart code from 2.2.3. Creating a Plot, one for each cell below)

Show the code
ggplot(penguins, aes(x="species")) + geom_bar()

ggplot(data = penguins)
Show the code
(
    ggplot(data=penguins, mapping=aes(x="flipper_length_mm", y="body_mass_g"))
    + geom_point()
)

This first plot is very simple, it can be difficult to differentiate though because there is no key indicator on what dots represent what species.

Show the code
(
    ggplot(
        data=penguins,
        mapping=aes(x="flipper_length_mm", y="body_mass_g", color="species"),
    )
    + geom_point()
)

This one is a lot better and shows the different penguins in codination with color. The red dots are the Adelie, the blue is Gentoo, and the green are the Chinstrap penguins. This allows us to see the body mass in comparrison to flipper length based on species. Looking at the graph, it’s evident that the Gentoo species has a higher body mass and flipper length.

Show the code
(
    ggplot(data=penguins, mapping=aes(x="flipper_length_mm", y="body_mass_g"))
    + geom_point(mapping=aes(color="species"))
    + geom_smooth(method="lm")
)

This graph has a lot better qualities but the one that is best is the next one.

Show the code
(
    ggplot(data=penguins, mapping=aes(x="flipper_length_mm", y="body_mass_g"))
    + geom_point(mapping=aes(color="species", shape="species"))
    + geom_smooth(method="lm")
)

This one allows us to put different shapes and colors to species which allows us to visiually see them a lot faster. The pink line that goes through is a linear regression smoothing line. Conclusion: The penguins with bigger flippers do have bigger mass.